mmg_233_2013_genetics_genomicswikiaorg-20200214-history
Haplotype Specific mRNA Quantification
This technique takes advantage of RNA-seq (high throughput RNA sequencing) and allows you to look at haplotype and isoform specific expression of mRNA for a protein. It not only allows you to detect a shift in equilibrium of isoform expression, but what haplotypes are present. Advantages of RNA-seq Unlinke microarrays, RNA-seq allows coverage of a wide range of known and unkown SNPs (1). RNA-seq also gives you a 'snapshot' of total RNA or individual RNA subsets (mRNA, tRNA etc.) made by the cell at a given point in time. Not only can it sequnce the RNA but it alows for estimation of the abundance of specific RNAs (2). Isoform Specific mRNA Quantification To quantify the isofoms of a protein present within a cell, the sequences generated by RNA-seq can be mapped to the chromosomes. The RNA-seq sequences often overlap exonsn and this information can then be used to identify the isofroms present (created by alternate splicing of the gene). The abundance of different isoforms can be estimated based on the number of times individual RNA-seq sequences appear. This method uses computer programs such as Cufflinks (2). Alternatively using Cufflinks, fragments can be assembled by identifying incompatible sequence pairs (2). This incompatibility indicates that the fragments belong to different isoforms. An overlap graph can then be generated with each fragment as one node on the graph (2). Fragments that are connected in the graph overlap in the genome and so, belong to the same isoform (2). These connected fragments are called 'paths' (2). A path contains fragments that could be used to construct a complete isoform (2). The overlap graph allows you to see how many isoforms of the protein of interest there may be. Haplotype Specific mRNA Quantification If the parental transcriptome (total RNA sequences) is available and recombination has no effect on the haplotype i.e. a specific combination of alleles is not lost through recombinaiton events, haplotype specific mRNA quantification can be performed (3). Turro et al. (2011), presented a new pipeline for estimating haplotype, isoform and gene specific expression using the MMSEQanalysis method. This method uses all reads that can at least be mapped to one annotated transcript sequence. Their method also allows them to estimate the expression of the two versions of each haplotype individually and thus it can detect asymmetric imbalances between isoforms of the same gene (3). To test the validity of this method, they applied it to published murine embryonic RNA-seq data for initial and reciprocal CAST/C57 crosses (3). Each sample was pooled from four individuals. Turro et al. (2011), used an existing C57 reference transcriptome construct a CAST reference transcriptome. Aligning CAST SNPs to the C57 transcriptome allowed the authors to combined the two references into a hybrid reference containing two entries for isoforms that differed in sequence between C57 and CAST (3). They found a cluster of transcripts that showed CAST overexpression in the initial crosses but were approximately balanced expression in the reciprocal crosses. They found that the cluster consisted wholly of transcripts on the X chromosome, suggesting that the initial crosses were male and the reciprocal crosses female. Another grouping in the scatterplot on the lower-left to upper-right diagonal, demonstrates consistent CAST/C57 differential expression regardless of the sex-strain combination of the parents, and could be an indication of cis regulation (3). MMSEQ allows imbalances to be assessed at the transcript level rather than for individual SNPs. It is not necessary to set arbitrary thresholds for the magnitude and significance of the imbalances in the data to make speak about transcript-level imbalances. For genes containing heterozygotes with opposing imbalances, the approach used when the data was originally published, was to scan the transcript annotations to identify isoform structures consistent with the observed SNP positions and imbalances (3). Initially, these genes were defined as 'complex' as long as at least one SNP was significant. H13 was used as a sample complex gene since it has two short isoforms and three longer isoforms with several additional exons towards the 3' end (3). Initially, it was found that the short isoforms contained heterozygotes with a paternal bias in their 3' exons while longer isoforms' heterozygotes on the 3' and intermediary exons had a maternal bias (3). Using MMSEQ, Turro et al. (2011) able to discern this by direct quantification of haplotypes. The two short isoforms were imbalanced towards the paternally inherited haplotype and two of the long isoforms were imbalanced towards the maternally inherited haplotype. In addition, a novel gene within the boundaries of H13, was also found to be paternally overexpressed. Conclusions Turro et al. (2011) were able to demonstrate that new RNA-seq data analysis techniques can be used to identify haplotypes and asses the imbalance between maternal and paternal isoforms. Their pipeline and MMSEQ method of analysis when applied to a data set was not only able to replicate the findings based on the data but was also able to detect cis-regulated transcripts. The pipeline and the MMSEQ software are available online. References 1. Wikipedia (2013) RNA-Seq''. http://en.wikipedia.org/wiki/RNA-Seq 2. Trapnell C, et al.'' (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature biotechnology 28(5):511-515. 3. Turro E'', et al.'' (2011) Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads. Genome biology 12(2):R13.